Convergence of Generators

Convergence of a generator from data is more subtle than the convergence of a transfer operator from data due to the presence of time scales in the former. For a fixed timestep size $\Delta t$ and infinite data we have convergence to the associated transfer operator. Thus we take the limit of $\Delta t \rightarrow 0$ afterwards to get convergence to the generator.

Given the exact generator

Q_exact = [-1.0 2.0; 1.0 -2.0]
Q_exact
2×2 Matrix{Float64}:
 -1.0   2.0
  1.0  -2.0

We generate a Markov chains of increasing size at a fixed timestep $dt$ and compute the empirical transition matrix. We then verify that the empirical transition matrix converges to the exact transition operator. Let us first generate a Markov chain of size 100.

using MarkovChainHammer.Trajectory: generate
dt = 1.0
steps = 100
markov_chain = generate(Q_exact, steps; dt=dt);

We now load the empirical operator constructors for both the generator and the transfer operator in order to seperately check for convergence.

First let us construct the transfer operator

using MarkovChainHammer.TransitionMatrix: perron_frobenius
ℳ_empirical = perron_frobenius(markov_chain)
2×2 Matrix{Float64}:
 0.625  0.657143
 0.375  0.342857

We see that the above matrix is close to the exact transfer operator

ℳ_exact = exp(Q_exact)
2×2 Matrix{Float64}:
 0.683262  0.633475
 0.316738  0.366525

Now let us construct the generator

using MarkovChainHammer.TransitionMatrix: generator
Q_empirical = generator(markov_chain; dt = dt)
2×2 Matrix{Float64}:
 -0.375   0.666667
  0.375  -0.666667

We see that the above matrix is quite far from the exact generator, which leads to the question "what happened?". The resolution to this dilemma is to observe that timescales of the markov chain were too large as compared to the expected holding times of the exact generator, which are 1.0 for state 1 and $1/2$ for state 2. If the chain is instead generated with smaller timesteps

dt = 0.1
markov_chain = generate(Q_exact, steps; dt=dt);
Q_empirical = generator(markov_chain; dt = dt)
2×2 Matrix{Float64}:
 -0.694444   1.42857
  0.694444  -1.42857

We can see that the estimate of the generator has improved. Note that we can also calculate the generator by taking the matrix logarithm and dividing by the timescale to get a similar answer

log(perron_frobenius(markov_chain)) / dt
2×2 Matrix{Float64}:
 -0.628269   1.59311
  0.628269  -1.59311

Let us now automate the process of checking convergence as function of fixed timestep size. We generate a Markov chain of increasing size and compute the empirical transition matrix. We then verify that the empirical transition matrix converges to the exact transition operator. We use the function norm from the LinearAlgebra package to compute the norm of the difference between the empirical and exact transition matrices. We use the difference between the matrices divided by the norm of the exact matrix to get a relative error. At small timesteps the transfer operator converges to the identity matrix, hence we divide the error by a factor of $dt$ in order to make a fair comparison to the relative error of the generator matrix. The takeaway message is that the error of the generator is bounded by both the timestep size and the available number of independent samples in the data. Here is the code to verify this statement

using LinearAlgebra: norm
println("Transfer Operator Convergence")
for dt in [1.0, 0.1, 0.01, 0.001]
    ℳ_exact_local = exp(Q_exact * dt)
    println("For a timestep of $dt")
    for i in 2:2:6
        n = 10^i
        markov_chain_local = generate(ℳ_exact_local, n)
        number_of_states = 2
        ℳ_empirical_local = perron_frobenius(markov_chain_local, number_of_states)
        empirical_error = norm(ℳ_empirical_local - ℳ_exact_local) / norm(ℳ_exact_local) / dt
        println("A chain of size 10^$i yields a relative empirical error of $(empirical_error)")
    end
    println("---------------------------")
end

println("---------------------------")
println("Generator Convergence")
for dt in [1.0, 0.1, 0.01, 0.001]
    ℳ_exact_local = exp(Q_exact * dt)
    println("For a timestep of $dt")
    for i in 2:2:6
        n = 10^i
        markov_chain_local = generate(ℳ_exact_local, n)
        number_of_states = 2
        Q_empirical_local = generator(markov_chain_local, number_of_states; dt = dt)
        empirical_error = norm(Q_empirical_local - Q_exact) / norm(Q_exact)
        println("A chain of size 10^$i yields a relative empirical error of $(empirical_error)")
    end
    println("---------------------------")
end
Transfer Operator Convergence
For a timestep of 1.0
A chain of size 10^2 yields a relative empirical error of 0.2052272765020095
A chain of size 10^4 yields a relative empirical error of 0.0009399099727159542
A chain of size 10^6 yields a relative empirical error of 0.00044363459027215016
---------------------------
For a timestep of 0.1
A chain of size 10^2 yields a relative empirical error of 0.6406130397167876
A chain of size 10^4 yields a relative empirical error of 0.024372108654314568
A chain of size 10^6 yields a relative empirical error of 0.00969501376207933
---------------------------
For a timestep of 0.01
A chain of size 10^2 yields a relative empirical error of 13.527939471702238
A chain of size 10^4 yields a relative empirical error of 0.1376897600902883
A chain of size 10^6 yields a relative empirical error of 0.024057157177490955
---------------------------
For a timestep of 0.001
A chain of size 10^2 yields a relative empirical error of 2.2360632258701383
A chain of size 10^4 yields a relative empirical error of 1.3390790093116096
A chain of size 10^6 yields a relative empirical error of 0.06102579821063059
---------------------------
---------------------------
Generator Convergence
For a timestep of 1.0
A chain of size 10^2 yields a relative empirical error of 0.6603436539783516
A chain of size 10^4 yields a relative empirical error of 0.6791138371288149
A chain of size 10^6 yields a relative empirical error of 0.6838007838954553
---------------------------
For a timestep of 0.1
A chain of size 10^2 yields a relative empirical error of 1.3652450841832677
A chain of size 10^4 yields a relative empirical error of 0.139696015339922
A chain of size 10^6 yields a relative empirical error of 0.13947707201691414
---------------------------
For a timestep of 0.01
A chain of size 10^2 yields a relative empirical error of 1.8027440423029175
A chain of size 10^4 yields a relative empirical error of 0.05086091818483745
A chain of size 10^6 yields a relative empirical error of 0.01982217935355222
---------------------------
For a timestep of 0.001
A chain of size 10^2 yields a relative empirical error of 1.0
A chain of size 10^4 yields a relative empirical error of 0.1861324264839971
A chain of size 10^6 yields a relative empirical error of 0.033407350060732074
---------------------------

In particular, for the convergence of the generator, we see that the error is roughly on the order of the timestep size, except for the last case. A heuristic explanation for this behavior is as follows. If we think of each entry of the matrix as a random variable then the convergence to the expected value (the exact entry of the matrix) is inversely proportional to the square root of the number of independent samples. As one decreases the timestep size, the number of independent samples decrease since the number of timesteps is fixed, that is to say we are looking at the simulation for an increasingly shorter amount of physical time. Roughly speaking one would expect the error to be

\[error \propto \max \left(dt, \frac{1}{\sqrt{S}} \right)\]

where $S$ is the number of independent samples. In the last case, since the largest decorrelation time of the markov chain is on the order of 1 time unit, we expect the number of independent samples in a physical time interval of $10^6 dt = 10^3$ to be roughly on the order of $10^3 / 5$, where 5 time units is taken as a decorrelation threshold for the markov chain. We see that the error is

approximate_error = 1 / sqrt(10^3 / 5)
0.07071067811865475

which is roughly the same order of magnitude as the final error.

We haven't discussed what happens as the number of states increases and the more nuanced behavior of both the generator matrix and transfer operator as the number of states increases. In general, the more states that one has the more data is needed to estimate the different entries of the matrix; however this need not always be the case, as there are some interesting exceptions to this rule. In particular if there is an underlying (or approximate) tensor product structure to the generator or transfer operator then the number of independent samples needed to estimate the entries of the matrix can be reduced.